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ABTRACT 


This  paper  studies  the  set  of  equilibrium  payoffs  in  games  with  long- 
and  short-run  players  and  little  discounting.  Because  the  short-run  players 
are  unconcerned  about  the  future,  equilibrium  outcomes  must  always  lie  on 
their  static  reaction  (best  response)  curves.  The  obvious  extension  of  the 
Folk  Theorem  to  games  with  this  constraint  would  simply  include  the  constraint 
in  the  definitions  of  the  feasible  payoffs  and  of  the  minmax  values.  This 
extension  does  obtain  under  the  assumption  that  each  player's  choice  of  a 
mixed  strategy  for  the  stage  game  is  publicly  observable,  but,  in  contrast  to 
standard  repeated  games,  the  limit  value  of  the  set  of  equilibrium  payoffs  is 
different  if  players  can  observe  only  their  opponents'  realized  actions. 


1.    Introduction 

The  "folk  theorem"  for  repeated  games  with  discounting  says  that  (under 
mild  conditions)  each  individually-rational  payoff  can  be  attained  in  a 
perfect  equilibrium  for  a  range  of  discount  factors  close  to  one.   It  has  long 
been  realized  that  results  similar  to  the  folk  theorem  can  arise  if  some  of 
the  players  play  the  constituent  game  infinitely  often  and  others  play  the 
'constituent  game  only  once,  so  long  as  all  of  the  players  are  aware  of  all 
previous  play.  A  standard  example  is  the  infinitely-repeated  version  of 
Selten's  [1977]  chain-store  game,  where  a  single  incumbent  faces  an  infinite 
sequence  of  short-run  entrants  in  the  game  depicted  in  Figure  1.  Each  entrant 
cares  only  about  its  one-period  payoff,  while  the  incumbent  maximizes  its  net 
present  value.  For  discount  factors  close  to  one  there  is  a  perfect 
equilibrium  in  which  entry  never  occurs,  even  though  this  is  not  a  perfect 
equilibrium  if  the  game  is  played  only  once  or  even  a  fixed  finite  number  of 
times.   In  this  equilibrium,  each  entrant's  strategy  is  "stay  out  if  the 
incumbent  has  fought  all  previous  entry;  otherwise,  enter;"  and  the 
incumbent's  strategies  is  "fight  each  entry  as  long  as  entry  has  always  been 
fought  in  the  past,  otherwise  acquiesce."  Other  examples  of  games  with  long 
and  short  run  players  are  the  papers  of  Dybvig-Spatt  [1980]  and  Shapiro  [1982] 
on  a  firm's  reputation  for  producing  high-quality  goods  and  the  papers  of 
Simon  [1951]  and  Kreps  [1984]  on  the  nature  of  the  employment  relationship. 
This  paper  studies  the  set  of  equilibrium  payoffs  in  games  with  long- 
and  short-run  players  and  little  discounting.  This  set  differs  from  what  it 
would  be  if  all  players  were  long-run,  as  demonstrated  by  the  prisoner's 
dilemma  with  one  enduring  player  facing  a  sequence  of  short-run  opponents. 
Because  the  short-run  players  will  fink  in  every  period,  the  only  equilibrium 


is  the  static  one,  no  matter  what  the  discount  factor.  In  general,  because 
the  short-run  players  are  unconcerned  about  the  future,  equilibrium  outcomes 
must  always  lie  on  their  static  reaction  (best  response)  curves.  This  is  also 
true  off  of  the  equilibrium  path,  so  the  reservation  values  of  the  long-run 
players  are  higher  when  some  of  their  opponents  are  short-run,  because  their 
punishments  must  be  drawn  from  a  smaller  set. 

The  perfect  folk  theorem  for  discounted  repeated  games  (Fudenberg-Maskin 
[1986])  shows  that,  under  a  mild  full-dimensionality  condition,  any  feasible 
payoffs  that  give  all  players  more  than  their  minmax  values  can  be  attained  by 
a  perfect  equilibrium  if  the  discount  factor  is  near  enough  to  one.    The 
obvious  extension  of  this  result  to  games  with  the  constraint  that  short-run 
players  always  play  static  best  responses  would  simply  include  that  constraint 
in  the  definitions  of  the  feasible  payoffs  and  of  the  minmax  values. 
Propositions  1  and  2  of  Section  2  shows  that  this  extension  does  obtain  under 
the  assumption  that  each  player's  choice  of  a  mixed  strategy  for  the  stage 
game  is  publicly  observable. 

We  then  turn  to  the  more  realistic  case  in  which  players  observe  only 
their  opponents'  realized  actions  and  not  their  opponents'  mixed  strategies. 
While  in  standard  repeated  games  the  folk  theorem  obtains  in  either  case,  when 
there  are  some  short-run  players  the  set  of  equilibria  can  be  strictly  smaller 
if  mixed  strategies  are  not  observed.  The  explanation  for  this  difference  is 
that  in  ordinary  repeated  games,  while  mixed  strategies  may  be  needed  during 
punishment  phases,  they  are  not  necessary  along  the  equilibrium  path.  In 
contrast,  with  short-run  players  some  best  responses,  and  thus  some  of  the 
feasible  payoffs,  can  only  be  obtained  if  the  long-run  players  use  mixed 
strategies.  If  the  mixed  strategies  are  not  observable,  inducing  the  long-run 
players  to  randomize  may  require  that  "punishments"  occur  with  positive 


probability  even  if  no  player  has  deviated.  For  this  reason  the  set  of 
equilibrium  payoffs  may  be  bounded  away  from  the  frontier  of  the  feasible  set. 

Proposition  3  of  Section  3  provides  a  complete  characterization  of  the 
limiting  value  of  the  set  of  equilibrium  payoffs  for  a  single  long-run  player. 
This  characterization,  and  the  results  of  Section  2,  assume  that  players  have 
access  to  a  publicly  observable  randomizing  device.   The  device  is  used  to 
implement  strategies  of  the  form:   if  player  i  deviates,  then  players  jointly 
switch  to  a  "punishment  equilibrium"  with  some  probability  p  <  1.  While  the 
assumption  of  public  randomizations  is  not  implausible,  it  is  interesting  to 
know  whether  it  leads  to  a  larger  limit  set  of  equilibrium  payoffs. 
Proposition  4  in  Section  4  shows  that  it  does  not:   We  construct  "target 
strategies"  in  which  a  player  is  punished  with  probability  one  whenever  his 
discounted  payoff  to  date  exceeds  a  target  value,  and  shows  that  these 
strategies  can  be  used  to  obtain  as  an  equilibrium  any  of  the  equilibrium 
payoffs  that  were  obtained  via  public  randomizations  in  Proposition  3. 

Proposition  3  shows  that  not  all  feasible  payoffs  can  be  obtained  as 
equilibrium,  so  in  particular  we  know  that  some  payoffs  cannot  be  obtained 
with  the  target  strategies  of  Proposition  4.   Inspection  of  that  construction 
shows  that  it  fails  for  payoffs  that  are  higher  than  what  the  long-run  player 
can  obtain  with  probability  one  given  the  incentive  constraint  of  the 
short-run  players:  For  payoffs  this  high,  there  is  a  positive  probability 
that  player  1  will  suffer  a  run  of  "bad  luck"  after  which  no  possible  sequence 
of  payoffs  could  draw  his  discounted  normalized  value  up  to  the  target.  As 
this  problem  does  not  arise  under  the  criterion  of  time-average  payoffs,  one 
might  wonder  if  the  set  of  equilibrium  payoffs  is  larger  under  time-averaging. 
Proposition  5  shows  that  the  answer  is  yes.   In  fact,  any  feasible  incentive 
compatible  payoffs  can  arise  as  equilibria  with  time-averaging,  so  that  we 


obtain  the  same  set  of  payoffs  as  in  the  case  where  the  player's  privately 
mixed  strategies  are  observable.  This  discontinuity  of  the  equilibrium  set  in 
passing  from  discounting  to  time  averaging  is  reminiscent  of  a  similar 
discontinuity  that  has  been  established  for  the  equilibria  of  repeated 
partnership  games  (Radner  [1986],  Radner-Myerson-Maskin  [1986]).  The 
relationship  between  the  two  models  is  discussed  further  in  Section  5. 

We  have  not  solved  the  case  of  several  long-run  players  and  unobservable 
mixed  strategies.   Section  5  gives  an  indication  of  the  additional 
complications  that  this  case  presents. 

2.    Observable  Mixed  Strategies 

Consider  a  finite  n-player  game  in  normal  form, 


g:   S.x xS  ->  R 


We  denote  player  i's  mixed  strategies  by  cr  e  X,  and  write  g(<7)  for  the 
expected  value  of  g  under  distribution  cr. 

In  this  section  we  assume  that  a  player  can  observe  the  others'  past 
mixed  strategies.  This  assumption  (or  a  restriction  to  pure  strategies)  is 
standard  in  the  repeated  games  literature,  but  as  Fudenberg-Maskin  [1986] 
[1987a]  have  shown,  it  is  not  necessary  there.   (Here  it  matters  —  see  the 
next  section!)   We  will  also  assume  that  the  players  can  make  their  actions 
contingent  on  the  outcome  of  a  publicly  observable  randoming  device. 

Label  the  players  so  that  players  1  to  j  are  long-run  and  j+1  to  n  are 
short-run.  Let 


B:  EiX...xEj=^Ij^i...xI^ 


be  the  correspondence  which  maps  any  strategy  selection  (<7  , ...,a.)  for  the 
long-run  players  to  the  corresponding  Nash  equilibria  strategy  selections  for 
the  short-run  players.   If  there  is  only  one  short-run  player,  B(a)  is  his 
best  response  correspondence. 

For  each  i  from  1  to  j ,  choose  m  =  (m. , . . . ,m  )  so  that  m  solves 


^  min      max  g  {<T^,m_^), 
m  egraph(B)  o. 


and  set 


v^  =  max  g  (f^^,in_^) 

1 


(This  minimum  is  attained  because  the  constraint  set  graph (B)  is  compact  and 

the  function  max  g  (cr.  ,m_.)  is  continuous  in  m_..) 

<j . 
1 

The  strategies  m_.  minimize  long-run  player  i's  maximum  attainable 
payoff  over  the  graph  of  B.   The  restriction  to  this  set  reflects  the 
constraint  that  the  short-run  players  will  always  choose  actions  that  are 
short-run  optimal.  Given  this  constraint,  no  equilibrium  of  the  repeated  game 
can  give  player  i  less  than  v..   (In  general,  the  short-run  players  could 
force  player  one's  payoff  even  lower  using  strategies  that  are  not  short-run 
optimal).  Note  that  m  specifies  player  i's  strategy  m.  ,  which  need  not  be 
a  best  response  to  m_.:  Player  i  must  play  in  a  certain  way  to  induce  the 
short-run  players  to  attain  the  minimum  in  the  definition  of  m  .  In  order  to 
construct  equilibria  in  which  player  i's  payoff  is  close  to  v.,  player  i  will 
need  to  be  provided  with  an  incentive  to  cooperate  in  his  own  punishment. 


In  the  repeated  version  of  g,  we  suppose  that  long-run  players  maximize 

the  discounted  normalized  sum  of  their  single-period  payoffs,  with  common 

t=oo 

discount  factor  5.  That  is,  long-run  player  i's  payoff  is  (l-5)\  5  g.(a(t)). 

t=0 
Short-run  players  in  each  period  act  to  maximize  that  period's  payoff.   All 

players,  both  long-  and  short-run,  can  condition  their  play  on  all  previous 

actions. 

Let  U  =  i  V  =  (v  ,  ...V  )  I  3a  in  graph  (B)  with  g{o')  =  vl 

Let         V  =  convex  hull  of  U; 

and  let     V  =  \v^v\    for  all  i  from  1  to  j,  v  >  v.). 

We  call  payoffs  in  V  attainable  payoffs  for  the  long-run  player.  Only 
payoffs  in  V*  can  arise  in  equilibrium.  We  begin  with  the  case  of  a  single 
long-run  player. 

Proposition  1:   If  only  player  one  is  a  long-run  player,  then  for  any 
v.ev*  there  exists  a  &e(0,l)  such  that  for  all  6e(5,l),  there  is  a 
subgame-perf ect  equilibrium  of  the  infinitely  repeated  game  with  discount 
factor  6  in  which  player  i's  discounted  normalized  payoff  is  v.. 


El22l-   ^ix  s  v,e  V  and  consider  the  following  strategies.  Begin  in 
Phase  A,  where  players  play  a  a  e  graph  (B)  (or  a  public  randomization  over 
such  cr's)  that  gives  player  1  payoff  v..  Deviations  by  the  short-run  players 
are  ignored.   If  player  one  deviates,  he  is  punished  by  players  switching  to 
the  punishment  strategy  m  for  T(6)  periods,  after  which  play  returns  to  Phase 
A;  if  T(6)  is  large  enough,  deviations  in  Phase  A  are  unprofitable.  Now  m- 
need  not  be  a  best  response  against  m_. ,  so  we  must  insure  that  player  one 


does  not  prefer  to  deviate  during  the  punishment  phase.   This  is  done  by 

specifying  that  a  deviation  in  this  phase  restarts  the  punishment.   Since  the 

most  that  player  1  can  obtain  in  any  period  of  the  punishment  phase  is  v  ,  he 

will  prefer  not  to  deviate  so  long  as  T(S)  is  short  enough  that  player  I's 

normalized  payoff  at  the  start  of  the  punishment  phase  is  at  least  v  .  Let  v. 

=  max      g. (f)  .  The  two  constraints  on  T(5)  will  be  satisfied  if: 
CTegraph(B) 

(1)    (l-5)v^+  6(l-6'^^^^g^(m-^)+6^^^^^^v^s  v^,  or  equivalently 


(1'  )  6T(^)+1  <{v^-6g^(m^)  +  (l-6)v^)/(v^-g^(m^)),  and 


T(5)      1     T(6) 
(2)    (1-5^   )g^(m  )  +  d^      v^  >  v^,  or  equivalently 


(2')  5^^°^  >  {v^-g^(ffi^))/{v^-g^(m^)) 


The  right-hand  sides  of  inequalities  (1'  )  and  (2'  )  have  the  same  denominator, 

and  for  5  close  to  1  the  numerator  of  (1'  )  exceeds  the  numerator  of  {2'  ) . 

T 
Then  since  6   is  approximately  continuous  in  T  for  5  close  to  1,  we  can  find  a 

5  <  1  such  that  for  all  greater  5  there  is  a  T(5)  satisfying  (1' )  and  (2'  ). 

Q.E.D. 


In  repeated  games  with  three  or  more  players,  a  full-dimensionality 
condition  is  required  for  all  feasible  individually  rational  payoffs  to  be 
enforceable  when  6  is  near  enough  to  one.  The  corresponding  condition  here  is 
that  the  dimensionality  of  V*  equals  the  number  of  long-run  players. 


Proposition  2:  Assume  that  the  dimensionality  of  V*  =  j ,  the  number  of 
long-run  players.  Then  for  each  v  in  V*,  there  is  a  6e(0,l)  such  that  for  all 
6e{5,i)  there  is  subgame-perf ect  equilibrium  of  the  infinitely  repeated  game 
with  discount  factor  5  in  which  player  i's  normalized  payoff  is  v.. 

Remark:  The  proof  of  Proposition  2  follows  that  of  Fudenberg-Maskin' s  Theorem 
2:  If  a  (long-run)  player  deviates,  he  is  punished  long  enough  to  wipe  out  the 
gain  from  deviation.  To  induce  the  other  (long-run)  players  to  punish  him, 
they  are  given  a  "reward"  at  the  end  of  the  punishment  phase.  One  small 
complication  not  present  in  Fudenberg-Maskin  is  that,  as  in  Proposition  1,  the 
player  being  punished  must  take  an  active  role  in  his  punishment.   This, 
however,  can  be  arranged  with  essentially  the  same  strategies  as  before. 

Proof:  Choose  a  o    (or  a  public  randomization  over  several  cr's)  so  that  q(o)   = 
v.  Also  choose  v'  in  the  interior  of  V*  and  an  c  >  0  so  that  for  all  i  from  1 

to  i  (v'  +c  , . . .  ,v'.  ,+c  ,v'.  ,v'.  +c  , ,v'. +c  )  is  in  V*  and  v'.  +  c  <  v. . 

Let  T  be  a  joint  strategy  that  yields  v'.  +  c  to  all  the  long-run  players  but 
i,  and  yields  v'.  to  i.  Let  w.  =  g.(m  )  be  player  i's  period  payoff  when  j  is 
being  punished  with  the  strategies  m  .  For  each  i,  choose  an  integer  N.  so 
that  ' 


v.+  N.v.  <  (N+1)  v'. 
1   1-1         1 


where  v.  =  max  g  is  i's  greatest  one-period  payoff. 


Consider  the  following  repeated-game  strategy  for  player  i; 


(0)   Obey  the  following  rules  regardless  of  how  the  short-run  players  have 
played  in  the  past: 


(A)   Play  <7.  each  period  as  long  as  all  long-run  players  played  o   last 
period,  or  if  o   had  been  played  until  last  period  and  two  or  more  long-run 
players  failed  to  play  o   last  period. 

If  long-run  player  j  deviates  from  (A) ,  then 
(B.)   Play  m.  for  N.  periods,  and  then 
(C)   Play  T-?  thereafter. 

If  long-run  player  k  deviates  in  phase  (B.)  or  (C) ,  then  begin  phase 
(B.)  again  with  j  =  k.   (As  in  phase  A,  players  ignore  simultaneous  deviations 
by  two  or  more  long-run  players.) 

As  usual,  it  suffices  to  check  that  in  every  subgame  no  player  can  gain 
from  deviating  once  and  then  conforming.  The  condition  on  N.  ensures  that  for 
S  close  to  one,  the  gain  from  deviating  in  Phase  A  or  Phase  C  is  outweighed  by 

Phase  B's  punishment.   If  player  j  conforms  in  B .  (i.e.  when  she  is  being 

N.   .   N. 
punished)  her  payoff  is  at  least  g.s  (1-6  ■')w.  +5  v'.  ,  which  exceed  v.  if  5  is 

close  enough  to  one.   If  she  deviates  once  and  then  conforms,  she  receives  at 

most  V.  the  period  she  deviates,  and  postpones  the  payoff  q.  >  v.  ,  which 

lowers  her  payoff.   If  player  k  deviates  in  Phase  B.,  she  is  minmaxed  for  the 

next  N,  periods  and  Phase-C  play  will  give  her  v'  instead  of  v'  +  c .  Thus  it 

is  easy  to  show  that  such  a  deviation  is  unprofitable.   (See  Fudenberg-Kaskin 

for  the  missing  computations.) 


3.    Unobservable  Mixed  Strategies 

We  now  drop  the  assumption  that  players  can  observe  their  opponents' 
mixed  strategies,  and  instead  assume  they  can  only  observe  their  opponents' 
realized  actions.   In  ordinary  repeated  games,  (privately)  mixed  strategies 
are  needed  during  punishment  phases,  because  in  general  a  player's  minmax 
value  is  lower  when  his  opponents  use  mixed  strategies.   However,  mixed 
strategies  are  not  required  along  the  equilibrium  path,  since  desired  play 
along  the  path  can  be  enforced  by  the  threat  of  future  punishments. 
Fudenberg-Maskin  showed  that,  under  the  full-dimension  condition  of 
Proposition  2,  players  can  be  induced  to  use  mixed  strategies  as  punishments 
by  making  the  continuation  payoffs  at  the  end  of  a  punishment  phase  dependent 
on  the  realized  actions  in  that  phase  in  such  a  way  that  each  action  in  the 
support  of  the  mixed  strategy  yields  the  same  overall  payoff. 

In  contrast,  with  short-run  players  some  payoffs  (in  the  graph  of  B)  can 
only  be  obtained  if  the  long-run  players  privately  randomize,  so  that  mixed 
strategies  are  in  general  required  along  the  equilibrium  path.  As  a 
consequence,  the  set  of  equilibrium  payoffs  in  the  repeated  game  can  be 
strictly  smaller  when  mixed  strategies  are  not  observable.   This  is 
illustrated  by  the  following  example  of  a  game  with  one  long-run  player.  Row, 
and  one  short-run  player.  Col. 

Let  p  be,  the  probability  that  Row  plays  D.   Col's  best  response  is  M  if 
0  s  p  <  1/2,  L  if  1/2  s  p  <  100/101,  and  R  if  p  s  100/101.  There  are  three 
static  equilibria:   the  pure  strategy  equilibrium  (D,R) ,  a  second  in  which  p  = 
1/2  and  Col  mixes  between  M  and  L,  and  a  third  in  which  p  =  100/101  and  Col 
mixes  between  L  and  R.   Row's  maximum  attainable  payoff  is  3,  which  occurs 
when  p  =  1/2  and  Col  plays  L. 
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4,    0 

0,    1 

-1,    -100 

2,    2 

1,    1 

0,    3 

Figure  1 

If  Row's  mixed  strategy  is  observable,  she  can  attain  this  payoff  in  the 
infinitely  repeated  game  if  S  is  near  enough  to  1.   If  however  Row's  mixed 
strategy  is  not  observable,  her  highest  equilibrium  payoff  is  at  most  2 
regardless  of  S . 

To  see  this,  fix  a  discount  factor  S,  and  let  v  (6)  be  the  supremum  over 

all  Nash  equilibria  of  ROW's  equilibrium  payoff.   Suppose  that  for  some  6 

* 
v(5)=2  +  c'  >2,  and  choose  an  equilibrium  <>   such  that  player  I's  payoff  is 

v(-^)  =  V  (5)  -  c  >  2.   It  is  easy  to  see  that  the  set  of  equilibrium  payoffs 

is  stationary:   Any  equilibrium  payoff  is  an  equilibrium  payoff  for  any 

subgame,  and  conversely.  Thus,  the  highest  payoff  player  1  can  obtain 

starting  from  period  2  is  also  bounded  by  v  (5).   Since  v(-£>)  is  the  weighted 

average  of  player  I's  first-period  payoff  and  her  expected  continuation 

* 
payoff,  player  I's  first-period  payoff  must  be  v  (5)  -  c/(i-o).  For  c 

sufficiently  small,  this  implies  that  player  I's  first  period  payoff  must 

exceed  2. 

In  order  for  Row's  first-period  payoff  to  be  at  least  2,  Col  must  play  L 

with  positive  probability  in  the  first  period.  As  Col  will  only  play  L  if  Row 

randomizes  between  U  and  D,  Row  must  be  indifferent  between  her  first  period 

choices,  and  in  particular  must  be  willing  to  play  D.  Let  v  be  Row's 

expected  payoff  from  period  2  on  if  she  plays  D  in  the  first  period.   Then  we 
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must  have 


(3)         2(1-5)  +  5vjj  =  V*. 


But  since  v_^  ^  v*,  we  conclude  that  v*  ^  2. 

While  Row  cannot  do  as  well  as  if  her  mixed  strategies  were  observable, 
she  can  still  gain  by  using  mixed  strategies.  For  5  near  enough  to  one  there 
is  an  equilibrium  which  gives  Row  an  normalized  payoff  of  2,  while  Row's  best 
payoff  when  restricted  to  pure  strategies  is  the  static  equilibrium  yielding 
1.  To  induce  Row  to  mix  between  U  and  D,  specify  that  following  periods  when 
Col  expects  mixing  and  Row  plays  U,  play  switches  with  probability  p  to  (D,R) 
for  ten  periods  and  then  reverts  to  Row  randomizing  and  Col  playing  L.  The 
probability  p  is  chosen  so  that  Row  is  just  indifferent  between  receiving  2 
for  the  next  eleven  periods,  or  receiving  4  today  and  risking  punishment  with 
probability  p.   This  construction  works  quite  generally,  as  shown  in  the 
following  proposition. 

Proposition  3:   Consider  a  game  with  a  single  long-lived  player,  player 
one,  and  let 


V.  =    max         min       ^i  ^^i  '  ^-i^ 


cregraph  B    s.  esupp  o 


Then  for  any  v  ^(Y-i-v  )  there  exists  a  5'  <  1  such  that  for  all  5e(5'  ,1), 

there  is  an  equilibrium  in  which  player  one's  normalized  payoff  is  v  .  For  no 

^  .  * 

o  is  there  an  equilibrium  where  player  one's  payoff  exceeds  v. 
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Proof:  We  begin  by  constructing  a  "punishment  equilibrium"  in  which  player 
one's  normalized  payoff  is  exactly  v..   If  v.  is  player  one's  payoff  in  a 
static  equilibrium  this  is  immediate,  so  assume  all  the  static  equilibria 
given  player  one  more  than  v  .  The  strategies  we  will  use  have  two  phases. 
The  game  begins  in  phase  A,  where  the  players  use  m  ,  a  strategy  which  holds 
player  one's  maximum  one-period  payoff  to  v  .   If  player  one  plays  s. ,  players 
publicly  randomize  between  remaining  in  phase  A  and  switching  to  a  static  Nash 
equilibrium  for  the  remainder  of  the  game.   If  e.  is  one's  payoff  in  this 
static  equilibrium,  set  the  probability  of  switching  after  x.  ,  p(s.),  to  be 


(l-5)(v^  -  g,(s^,m_J)) 
(4)  p(s.  )  = 


1'      y  ,         ,      1 
1  ~  ^1 ^^l'"-l 


(If  5  is  near  enough  to  one,  p(s.)  is  between  0  and  1.) 

The  switching  probability  has  been  constructed  so  that  player  one's 
normalized  profit  is  v  for  all  actions,  including  those  in  the  support  of  m  , 

so  she  is  indifferent  between  these  actions. 

*      *  *     *   * 

Next  we  construct  strategies  yielding  v.  for  v.  ^  Yi  •  "^^^  ^     ~    ^^i'  *^-i^ 

be  the  corresponding  mixed  strategies.  Play  begins  in  phase  A  with  players 

following  a.   If  player  one  deviates  to  an  action  outside  the  support  of  c 

■then  switch  to  the  "punishment  equilibrium"  constructed  above.   If  player  one 

plays  an  action  s.  in  the  support  of  o   ,   then  switch  to  the  punishment 

equilibrium  with  probability  p(s  ),  and  otherwise  remain  in  phase  A.   The 

probability  p(s.)  is  chosen  so  that  player  one's  payoff  to  all  actions  in  the 

* 
support  of  o     is  v..  As  above,  this  probability  exists  if  6  is  near  enough  to 

one.  These  strategies  are  clearly  an  equilibrium  for  large  5. 
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Equilibrium  payoffs  between  v  and  v  are  obtained  by  using  public 
randomizations  between  those  two  value.  The  argument  that  player  one's  payoff 
cannot  exceed  v  is  exactly  as  in  the  example. 


4.  No  Public  Randomizations 

The  equilibria  that  we  constructed  in  the  proofs  of  Theorems  1  through  3 
relied  on  our  assumption  that  players  can  condition  their  play  on  the  outcome 
of  a  publicly  observed  random  variable.  While  that  assumption  is  not  • 
implausible,  it  is  also  of  interest  to  know  whether  the  assumption  is 
necessary  for  our  results.  For  this  reason,  Proposition  4  below  extends 
Proposition  3  to  games  without  public  randomizations.   (We  have  not  thought 
about  the  possible  extension  of  Propositions  1  and  2  because  we  think  the 
situation  without  public  randomizations  but  where  private  randomization  can  be 
verified  ex-post  is  without  interest.)   The  intuition,  as  explained  in 
Fudenberg-Kaskin  [1987c]  ,  is  that  public  randomizations  serve  to  convexify  the 
set  of  attainable  payoffs,  and  when  5  is  near  to  1  this  convexif ication  can  be 
achieved  by  sequences  of  play  which  vary  over  time  in  the  appropriate  way. 
Fudenberg-Kaskin  [1987c]  shows  that  public  randomizations  are  not  necessary 
for  the  proof  of  the  perfect  Folk  Theorem.  However,  as  we  have  already  seen, 
there  are  important  differences  between  classic  repeated  games  and  repeated 
games  with  some  short-run  players,  so  the  fact  that  public  randomizations  are 
not  needed  for  the  folk  theorem  should  not  be  thought  to  settle  the  question 
here. 
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Proposition  4:  Consider  a  game  with  a  single  long-run  player,  player  1, 
where  public  randomizations  are  not  available.  As  in  Proposition  3,  let 


V  =     max       min       ^l^^i'  "^-1^  ' 
o   e  graph (B)   s.e  supp  o 


*  * 

and  let  cr  be  a  strategy  that  attains  this  max.  Then,  for  any  v.  e  (v.,v.) 

there  exists  a  S'  <1  such  that  for  all  5e(6'  ,\)   there  is  a  subgame-perf ect 
equilibrium  where  player  I's  discounted  normalized  payoff  is  v  . 


Remark:  Fix  a  static  Nash  equilibrium  o   with  payoffs  v.  .  For  each  v.  the 
proof  constructs  strategies  that  keep  track  of  the  agent's  total  realized 
payoff  to  date  t  and  compares  it  to  the  "target"  value  of  (1-5  )  v.,  which  is 
what  the  payoff  to  date  would  be  if  the  agent  received  v  in  every  period.   If 
V  exceeds  the  payoff  in  a  static  equilibrium,  then  play  initially  follows  the 
(possibly  mixed)  strategy  o   ,  and  whenever  the  realized  total  is  sufficiently 
greater  than  the  target  value,  the  agent  is  "punished"  by  reversion  to  the 
static  equilibrium.  If  the  target  is  less  than  the  static  equilibrium,  then 
play  starts  out  at  the  (possibly  mixed)  strategy  m  ,  with  intermittent 
"rewards"  of  the  static  equilibrium  whenever  the  realized  payoff  drops  too 
low. 


Proof: 


(A)  It  is  trivial  to  obtain  v  as  an  equilibrium  payoff. 
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(B)    To  attain  any  payoff  v  between  v  and  v  we  proceed  as  follows. 
Renormalize  the  payoffs  so  that  v.  =  0,  and  take  6  large  enough 
that  (l-6)v  <  v..   Define  Jq  =  0  and  •»  (0)  =  a  ,  and  for  each  time  t>0 
define  the  strategies  -^    (h  )  and  an  index  J  as  follows: 

J^  =  J^_^+  (l-6)6^^"^^g^(s^(t-l),/^(h^_^), 
where  s,  (t-1)  is  player  I's  action  in  period  t-1  (as  opposed  to  his  choice  of 
mixed  strategy,)  and  R  =(1-6  )v..   If  player  I's  payoff  were  v.  each  period, 
then  his  accrued  payoff  J  would  equal  R. .  The  equilibrium  strategies  will 
"punish"  the  agent  whenever  J  exceeds  R  by  too  large  a  margin.  More 
precisely,  we  define 


^*(h^.)  = 


o-     if   J^  >  R^^,  and  J_  s  R_  for  all  T<t 

t  t+1  T      T 
* 

o            if   J^  <  R^^,  and  J_  ^  R_  for  all  Tst 

t  t+1  T      T 


if   J^  <  R^  for  any  T<t 


Note  that  since  J  is  a  discounted  sum,  for  each  infinite  history  h^, 
J  converges  to  a  limit  J^.  Moreover,  as  long  as  the  other  players  use 
strategy  -s_, ,  player  I's  payoff  to  any  strategy  is  simply  the  expected  value 
of  J^,  and  his  expected  payoff  in  any  subgame  starting  at  time  t  is 

* 
We  will  now  argue  that  (i)  if  player  1  uses  strategy  <>.,   then  J  ^   R. 

for  all  times  t  and  histories  h  ,  which  imples  that  J^^  2:  v  ,  (ii)  that 

regardless  of  how  player  1  plays,  ^^^   v  ,  so  player  I's  payoff  in  the 

subgame  starting  at  time  t  is  bounded  by  5   (v  -J  )  for  all  histories  h^^,  and 

(iii)  that  in  any  subgame  where  at  some  "^^t  J   <  R  ,  it  is  a  best 

response  for  all  players  to  follow  the  prescribed  strategy  of  always  playing 
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the  static  equilibrium  o ,   and  so  Jg^  ^  ^i  ' 

Conditions  (i)  and  (ii)  imply  that  it  is  a  best  response  for  player 

* 
1  to  play  -d  in  every  subgame  where  J  has  never  dropped  below  R  ,  and  that 

player  I's  equilibrium  payoff  is  v  .  Condition  (iii) ,  whose  proof  is 

immediate,  says  that  following  -s^  is  also  a  Nash  equilibrium  in  subgames 

where  J  has  dropped  below  R  ,  so  that  <>  is  a  subgame-perf ect 

equilibrium.  (The  condition  that  the  short-run  players  not  wish  to  deviate  is 

incorporated  in  the  construction  of  ^  . ) 


* 
Proof  of  (i)  :  We  must  show  that  if  player  1  follows  <>.    then  for  all  t, 

J^^(l-S  )v.=  R  .   Since  Jq=  0,  this  is  true  for  t=0.  Assume  it  is  true  for 

t=T.  At  period  t,  either  (a)  J^  i  ^r+i   °^  ^^^  "^t  *''^t+i'  "^^  ^^^^   ^^^  ' 

*      * 
■fi  (t)  =  o.      Since  g.  (s,,ct_  )  =  0  for  every  pure  strategy  s.  in  the  support  of 


,,  we  have  J^,i  =  J^  -  and  J  2:  R  by  inductive  hypothesis.  In  case  (b)  , 

iz  "k  "k  "k  "k 

<>    (h^)=o-  ,so  min  lg^{s^(-r)  ,-a_^(h^))  ls^(T)esupport(<^^{h^))  1  =  v^, 


a 
* 


and  J^_^^  -  >^T  "^  {l-6)6'^v*  ^  (1-5 '^)v*  +  (l-6)o'^v*  =  (1-5 '^■^■^)v^  =  R^_^^, 

where  the  second  inequality  comes  fron  the  inductrive  hypothesis. 
Thus  J  s  (1-6  )v  for  all  t,  and  so  if  player  1  follows  -ft.  then  J^s:  v.. 


Proof  of  (ii) :  Next  we  claim  that  regardless  of  how  player  1  plays,  ^^^   v  . 

If  for  some  history  ^qq  ^  ^i  '  '^^^'^  there  is  a  T'  such  that  J  s:  (1-6  "^  )v^  for 

* 
all  t  >  T'  .  Thus  *_i (h  )  =  o   for  all  t  >T'  ,  and  since  the  most  player  1  can 

get  when  his  opponents  play  o   is  zero,  we  have  that  J^  ^  J„,  .  Let  T  be  the 

smallest  such  T'  so  that  J™- _,  <  (l-6'^)v  .   (Since  J  =  0,  T  >  0.) 

T     -  T 

Then  Jqd  -  J^  <  J™_-  +  5  (1-5 )v   <  J    +  5  v  <  v  ,  where  we  have  substituted 

in  our  bound  on  6.  This  argument  also  shows  that  player  I's  payoff  in  the 

subgame  starting  at  time  t  is  bounded  by  6   (v.-J  ) ,  and  from  part  (i)  this 
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* 

payoff  can  be  attained  by  following  -f^.  , 


(C) :  Next  we  show  how  to  construct  equilibria  (for  large  enough  S)  that 

yield  payoffs  v  between  v  and  the  static  equilibrium  payoff  of  zero.   Pick 

a  v-e:(v  ,0),  and  choose  6  large  enough  that  (l-5)min  g,{<^)    >  v..  Then  set  J^ 

o 

*       1 
=  0,  and  -i^  (0)  =  m  .   Now  define  J^(h.)  and  *{h  )  as  follows. 


t  * 

Set  J^  =  J^_^  +  6  g^(s^{t),  ^_j^{h^)),  and  set 


r 


/(h^)  =^ 


if   J,  <  R^,, 
m^     if  J^  2  R^ 


Proceeding  as  above,  we  claim  that 

* 
(i)  if  player  1  uses  strategy  <».  ,  then  J,  ^  R^ 

for  all  times  t  and  histories  h  ,  which  imples  that  J^  -   v  , 

(ii)  that  regardless  of  how  player  1  plays,  ^^  -   v  ,  so  player  I's  payoff  in 

the  subgame  starting  at  time  t  is  no  greater  than  5   (^i"''*.^  '  ^^"^ 

(iii)  that  in  subgames  where  J  <R  it  is  a  best  response  for  player  one  to 

t        K, 
* 

play  -a  (h^)  =  '^^.^ 


Proof  of  (i):  We  must  show  that  if  player  1  follows  <>.    then  for  all  t, 
J^ (1-5  )v  =  R  .  Since  J=   0,  this  is  true  for  t=0.  Assume  it  is  true  for 
t=T.  At  period  t,  either  (a)  J^  ^  R^ , ,  or  (b)  J^  <R^^, • 
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In  case    (a) , 

T  T 

J^,-    ^  J     +5      (l-5)min{g, )    ^  J^+  6     v.    (from  our  bound  on  5) 

T 

>R^  +6  V   (by  inductive  hypothesis) 
=R   ,  . 

T+1 

In  case  (b)  ,  ^    (h^)=a,  so  g^  (s^  (t)  ,-d_^  (h^)  )=0  for  all  s^  (T)esupport  ^^(h^), 

and  J^^^  =  J^  ^  R^  .  R^^^. 

t  *  * 

Thus  J.^(l-S  )v.  for  all  t,  and  so  if  player  1  follows  <>.    then  J^^   v  . 


Proof  of  (ii) :   We  claim  that  for  all  strategies  of  player  1  and  all  times  t 
and  histories  h.  ,  J^  (1-5  )v.  =  R  .   Since  J^=   0,  this  is  true  for  t=0. 
Assume  it  is  true  for  t='^.  Then  at  period  t,  either  (a)  J   <  R^._^,  or 

(b)  J^  -^T+i-   ^"^  ^^^^    ^^^'  ^  ^^^  ^  ^'   ^°   '^T+i  -  "^T  ^  ^T+l' 

*      2  * 

In  case  (b)  ,  -o    (h^)=m   ,so  max  5-i  (s.  ,  o_.{h))   =  v    ,   and 

"l 

T  T  T 

J^ , ,  <  J^  +  (1-5)5  V  <  (l-o  )v  +(1-5)6  V  (by  the  inductive  hypothesis) 

,-T  +  l 

s  (l-o    )v   (from  the  bound  on  o)  =  R  ^  . 
Thus  J  <(l-5  )v  for  all  t,  and  so  regardless  of  how  player  1  plays,  J^^  v  . 


(iii)        Conditions  (i)  and  (ii)  show  that  in  any  subgame  with  J.  ^R. 
player  1  can  attain  the  upper  bound  of  v.  by  following  <>.  .   Now  we 
consider  subgames  with  J   <  R^ •   If  J^  -  v  ,  then  regardless  of  how  player  1 
plays,  we  will  have  J^  ^  R  for  all  t  i  t,  so  player  I's  opponents  will 
play  o     for  the  remainder  of  the  game.  Here  it  is  clearly  a  best  response  for 
player  1  to  play  o     =  o   .      If  J  >  v  ,  then  by  playing  cr  player  1  can  ensure 
that  J^  2:  R^  at  some  '^>t,  which  ensures  that  player  1  attains  a  payoff  of  v 
in  the  subgame  starting  at  t.   If  player  1  instead  chooses  a  strategy  which 
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assigns  positive  probability  to  the  event  that  J   <  ^t-  ^or  all  t  >  t,  he  c 


an 


only  lowe  his  payoff:  The  payoff  for  histories  with  J   <  R   is  less  than  v., 

•*  '^•'  '^-"  (BOO  ]_ 

and  the  payoff  for  the  histories  with  J_^R_  is  bounded  above  by  v,  . 

00    CD  1 

Q.E.D. 


Proposition  4  shows  how  to  attain  any  payoffs  between  v  and  v  by  means 
of  "target  strategies."  From  Proposition  3  we  know  that  such  strategies 
cannot  be  used  to  attain  higher  payoffs.  We  think  that  it  is  interesting  to 
note  where  an  attempted  proof  would  fail. 

Ik 

In  part  (A)  ,  we  proved  that  if  player  1  followed  <>     then  for  every 

sequence  of  realizations  player  I's  payoff  is  at  least  v  .   Imagine  that  we 

*  ^  t  . 

try  to  attain  a  payoff  v  >  v,  by  setting  the  target  R  =  (l-o  )v  .  Then  m 

the  "reward"  phases  where  o     is  played,  it  might  be  that  player  I's  realized 
payoff  is  less  than  v, .   (Recall  that  by  definition  it  cannot  be  lower  than 
v  .).  After  a  sufficiently  long  sequence  of  these  outcomes,  player  I's 
realized  payoff  J  would  be  so  much  lower  than  v  that  even  receiving  the  best 
possible  payoff  at  every  future  date  would  not  bring  his  discounted  normalized 
payoff  up  to  the  target. 

This  problem  of  going  so  far  below  the  target  that  a  return  is  impossible 
does  not  arise  with  the  criterion  of  time-average  payoffs,  since  the  outcomes 
in  any  finite  number  of  periods  are  then  irrelevant.  For  this  reason  we  can 
attain  payoffs  above  v.  under  time  averaging,  as  we  show  in  the  next  section. 


5.   Time  Averaging 

The  reason  that  player  I's  payoff  is  bounded  by  what  he  obtains  when  she 
plays  her  least  favorite  strategy  in  the  support  of  o     is  that  every  time  she 
plays  a  different  action  she  must  be  "punished"  in  a  way  that  makes  all  of  the 
actions  in  o     equally  attractive.  A  similar  need  for  "punishments"  along  the 
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equilibrium  path  occurs  in  repeated  partnership  games,  where  two  players  make 
an  effort  decision  that  is  not  observed  by  the  other,  and  the  link  between 
effort  and  output  is  stochastic.   Since  shirking  by  either  player  increases 
the  probability  of  low  output,  low  output  must  provoke  punishment,  even  though 
low  output  can  occur  when  neither  player  shirks.  This  is  why  the  best 
equilibrium  outcome  is  bounded  away  from  efficiency  when  the  payoff  criterion 
is  the  discounted  normalized  value.   (Radner-Myerson-Maskin  [1986] , 
Fudenberg-Maskin  [1987a]).  However,  Radner  [1986]  has  shown  that  efficient 
payoffs  can  be  attained  in  partnerships  with  time  averaging.  His  proof 
constructed  strategies  so  that  (1)  if  players  never  cheat,  punishment  occurs 
only  finitely  often,  and  thus  is  negligible,  and  (2)  an  infinite  number  of 
deviations  is  very  likely  to  trigger  a  substantial  punishment.   Since  no 
finite  number  of  deviations  can  increase  the  time-average  payoff,  in 
equilibrium  no  one  cheats  yet  the  punishment  costs  are  negligible. 

Since  the  inefficiencies  in  repeated  partnerships  and  games  with 
short-run  players  both  stem  from  the  need  for  punishments  along  the 
equilibrium  path,  it  is  not  surprising  that  the  inefficiencies  in  our  model 
also  disappear  when  players  are  completely  patient.  We  prove  this  with  a 
variant  of  the  "target  strategies"  we  used  in  Section  4.   These  strategies 
differ  from  Radner 's  in  that  even  if  player  1  plays  the  equilibrium  strategy, 
she  will  be  punished  infinitely  often  with  probability  one.   However,  along 
the  equilibrium  path  the  frequency  of  punishment  converges  to  zero,  so  that  as 
in  Radner  the  punishment  imposes  zero  cost. 


Proposition  5^:   Imagine  that  player  1  evaluates  payoff  streams  with  the 

t=T-l 
criterion  lim  inf    E  (1/T)  )   g  {s{t)).  Then  for  all  v  cv  there  is  a 
T  CD       ts'O 


subgame-perf ect  equilibrium  with  payoffs  v  . 


21 


Remark:  The  proof  is  based  on  a  strong  law  of  large  numbers  for  martingales 

2/ 
with  independent  increments,   which  we  extend  to  cover  the  difference  between 

a  supermartingale  and  its  lowest  value  to  date.  The  relevant  limit  theory  is 

developed  in  the  Appendix. 


Proof:  As  in  Proposition  4,  we  use  different  strategies  for  payoffs  above  and 
below  some  fixed  static  equilibrium  o.      Imagine  that  v.  exceeds  player  I's 
payoff  in  this  equilibrium,  and  normalize  v,  =  0.  Let  o   be  the  (possibly 
mixed)  strategy  in  graph  (B)  that  maximizes  player  one's  expected  payoff,  and 
define  g^(o')  =  v  ^^_^ 

T-1 
Define  Jq  =  0  and  J  =  \   g  (s  (t) ,s_  (t) ) .   (This  differs  from  the 

t  =  0 
definition  in  Section  4,  where  we  used  player  I's  realized  action  and  the 

mixed  strategy  of  her  opponents  in  defining  J^).  Note  that  player  I's 

objective  function  is  lim  inf  E(1/T)J  -,  Set 


o*  {\i^)    =  < 


o     if  J  >o 


o     if  J  <0 


Ve  claim  that  (i)  no  matter  how  player  1  plays,  her  payoff  is  bounded  by  v  , 
and  (ii)  that  by  following  s.  player  1  can  attain  payoff  v.  almost  surely  (and 
hence  in  expectation.) 

To  prove  this,  let  -o{h  )  be  an  arbitrary  strategy  for  player  1,  and  fix 

the  associated  probability  distribution  over  infinite-horizon  histories.  For 

* 

each  history,  let  R  (h  )  =  Its  t-1  I -o  {h^)=<J]   be  the  "reward"  periods,  and  let 
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P  (h  )  =  lT<t-l  I  <!>    (h^)  =  o\   be  the  "punishment"  ones. 

Then  let  M  (h  )  =  J   g.  (x(t))  be  the  sum  of  player  I's  payoffs  in  the  good 


T  e  R 


periods,  and  set 


N^(h^)=   ^g^{x{t)) 


T  ep 
t 

Note  that  the  reward  and  punishment  sets  and  the  associated  scores  are  defined 
path  wise,  i.e.  they  depend  on  the  history  h  ;  henceforth,  though,  we  will 
omit  the  history  h  from  the  notation.  Finally  define  M  =  max  M  ,  N  =  min 

^  ^     T<t  T<t 

N^,  and  v  =min  g  (f^) 


We  claim  that  for  all  t. 


(5)      v^+  (K^-K^)  ^  J^  ^  v^+  (N^-N^). 


This  is  clearly  true  for  t  =  0.  Assume  (5)  holds  for  all  T<t.  At  the  start 
of  period  t,  either  (a)  J  >0  or  (b)  J  ^  0.  In  case  (a),  J  .  ^  ^i''''-^t  ~^1" 
Y^  +(K^-K^).  Also,  since  <>    (h  )  =  o, 

J^^i  =  J.+  N.^,-N^s  ;^+N   -N.  i   v,  +  (N,^,-KV^. )  ,  so  that  (5)  is  satisfied. 
In  case  (b)  ,  J^^^  .  ^^^J^  .  v^  <-    v^^{N^^^-N^^^)  , 

and  J^^^  =  J^+  K^^^-K^  .  v^  +Kt+r'^t  '  ^+ ("t+l'^t+l^  ' 
so  once  again  (5)  is  satisfied. 
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Lemmas  3  in  the  Appendix  shows  that  (N  -N  )/t  converges  to  zero  almost 
surely.   Since  the  per-period  payoffs  are  uniformly  bounded,  this  implies  that 
limsup  {1/T)J_  s  0  almost  surely,  and  since  the  per-period  payoffs  are 
uniformly  bounded,  limsup  1/T  E(J  )  ^  0  as  well.  Lemma  4  shows  that  if  player 
1  plays  so  that  M  is  a  submartingale,  then  the  (M  -M)  converges  to  zero  as 
well.   Since  this  is  true  when  player  1  follows  s, ,  the  result  follows. 

Q.E.D. 


We  can  show  that  with  our  strategies,  player  1  is  punished  infinitely 
often  (J  >  0)  with  probability  one.  This  contrasts  with  Radner's  construction 
of  efficient  equilibria  for  symmetric  time-average  partnership  games,  where 
the  probability  of  infinite  punishment  is  zero.   It  seems  likely  that  our 
"target-strategy"  approach  provides  another  way  of  constructing  efficient 
equilibria  for  those  games;  it  would  be  interesting  to  know  whether  this  could 
be  extended  to  asymmetric  partnerships.  Our  approach  has  the  benefit  of 
making  more  clear  why  the  construction  cannot  be  extended  to  the  discounting 
case.   It  also  avoids  the  need  to  invoke  the  law  of  the  iterated  logarithm, 
which  may  make  the  proof  more  intuitive,  although  we  must  use  the  strong  law 
for  martingales  in  its  place. 

6.    Several  Long  Run  Players  with  Unobservable  Kixed  Strategies 

The  case  of  several  long-run  players  is  more  complex,  and  we  have  not 
completely  solved  it.  As  before,  we  can  construct  mixed-strategy  equilibria 
in  which  the  long-run  players  do  better  than  in  any  pure  strategy  equilibrium, 
and  once  again  they  cannot  do  as  well  as  if  their  mixed  strategies  were 
directly  observable.   However,  we  do  not  have  a  general  characterization  of 
the  enforceable  payoffs.   Instead,  we  offer  an  example  of  payoffs  that  cannot 
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be  enforced,  and  a  very  restrictive  condition  that  suffices  for 
enforceability. 

Figure  2  presents  a  3-player  version  of  the  game  in  Figure  1.  Row's  and 
Col's  choices  and  payoffs  are  exactly  as  before.   The  third  player,  DUMKY,  who 
is  a  long-run  player,  receives  3  if  Col  plays  L  and  receives  0  otherwise.  The 
feasible  payoffs  for  Row  and  DUMMY  are  depicted  in  Figure  2.   Consider  the 
feasible  point  at  which  p  =  1/2  and  Col  plays  L.   Here  Row  and  Dummy  both 
receive  3.  The  argument  of  Section  3  shows  that  Row's  best  equilibrium  payoff 
is  not  3  but  2,  which  is  the  minimum  of  payoff  over  the  actions  in  the  support 
of  her  mixed  strategy.   Dummy  is  not  mixing,  so  Dummy's  minimum  payoff  over 
the  support  of  her  strategy  is  3.   (Indeed  this  is  the  minimum  over  the 
support  of  the  produce  of  the  two  strategies.)   Thus  one  might  hope  that,  by 
analogy  to  the  proof  of  Proposition  3,  we  could  show  that  the  payoffs  (2,3) 

were  enforceable.  But  these  payoffs  are  not  even  feasible!   The  highest 

1  9  G 
Dummy's  payoff  can  be  when  Row's  payoff  is  2  is  2  -r-rrr   .   (See  Figure  3,  which 

/  U  0 

depicts  the  feasible  set.)   The  problem  is  that  an  equilibrium  in  which  Row 
usually  randomizes  must  sometimes  have  Col  play  M  or  R  to  "tax  away"  Row's 
"excess  gains"  from  playing  U  instead  of  D,  and  this  "tax"  imposes  a  cost  on 
Dummy . 
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M 


U 
D 


4,3,0 

0.0,1 

-1,0,   -100 

.      2,3,0 

1,1,1 

0,0,3 

Player  two  is   a  "dummy" ,  player  three  chooses   COLs  . 


Player  2's  payoff     3  - 


(T5i.°) 


(^T5T-^) 


Player  I's  payoff 


Feasible  set  when  three  plays  a  SR  best  response . 


Figure  "J 


Next,  consider  the  game  in  Figure  4. 


u 

1,    1, 

1 

2,    0,    1 

D 

0,    2, 

1 

-1,    -1,'   -9 

1,    1,    -99 

2,    0,    1 

0,    2,    1 

-1,   -1,    -1 

14,    4,    0 

14,    2,    0 

12,    4,    0 

12,    2,    0 

Figure  4 

In  this  game,  player  1  chooses  Rows,  player  2  chooses  Columns,  and 
player  3  chooses  matrices;  players  1  and  2  are  long-lived  while  player  3  is 
short-lived.   The  unique  one-shot  equilibrium  is  (U,L,A)  with  payoff  (1,1,1). 
The  long-lived  players  can  obtain  a  higher  payoff  if  they  induce  player  3  to 
choose  C,  which  requires  both  long-run  players  to  use  mixed  strategies.   [Let 
p  =  prob  U;  q  =  prob  L,  then  player  3  chooses  C  if  (1-p) (1-q)  ^  1/10  and  pq  ^ 
1/100].  For  example,  if  both  long-run  players  use  50-50  randomization  the 
payoffs  are  (13,3,0).  Call  this  strategy  cr  =  (a  o   ). 
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Now  let  us  explain  how  to  enforce  (2,2)  as  an  equilibrium  payoff.   It 
will  be  clear  that  the  construction  we  develop  is  somewhat  more  general  than 
the  example;  we  do  not  give  the  general  version  because  it  does  not  lead  to  a 
complete  characterization  of  the  equilibrium  payoffs.  As  in  the  proofs  of 
Propositions  1  and  2,  the  strategies  we  construct  depend  on  the  history  of  the 
game  only  through  a  number  of  "state  variables,"  with  the  current  state 
determined  by  last  period's  state  and  last  period's  outcome  through  a 
(commonly  known)  transition  rule.  Let  D  and  R  be  the  "first"  strategies, 
denoted  s.  (1)  ,  and  s„(l),  and  let  U  and  L  be  the  second,  s.  (2)  and  s„(2). 

Play  begins  in  state  0.   In  this  state,  each  player  pays  o,   which  gives 
equal  probability  weight  to  his  two  actions.   If  player  1  plays  action  j,  and 
player  2  action  k,  the  next  period  state  is  (j,k).   The  payoffs  when  beginning 
play  in  state  (j,k)  are  denoted  ct(k,k)  =  (ex  (j,k),  a  {j,k)).  In  our 
construction,  each  player's  continuation  payoff  will  be  independent  of  his 
opponents'  last  move,  so  that  ct  (j,k)  =  a  (j)  and  a  (j,k)  =  °2^^'^'  filially ' 
in  each  state,  the  «.'s  and  the  specified  transition  rule  will  be  such  that 
each  player  is  indifferent  between  his  pure  strategies  and  thus  is  willing  to 
randomize. 

We  will  find  it  convenient  to  first  define  the  a's,    and  then  construct 
the  associated  strategies.   Set  the  a's  so  that 


v^  =  (1-6)  g^{E^(j),  cr^)  +  oa^(j) 
(6) 


In  our  example,  a^(l)  solves  2  =  (1-6)  (12)  +  6a^(l),  so  cx^(l)  =  12  - 
10/6.  Similar  computations  yield  a  (2)  =  14  -  12/6,  tx  (i)  =  4  -  2/6,  and 
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CX2(2)  =  2. 

If  the  observed  play  is  (s.  (1) ,  s.(l)),  next  period's  state  is  (1,1), 
with  payoffs  a(l,i).  Here  play  depends  on  a  public  randomization  as  follows. 
Choose  a  point  w  in  the  set  P  of  payoffs  attainable  without  private 
randomizations,  and  a  probability  p£(0,l),  such  that 

(7)  p(l-5)w  +  (l-p)v  =  {l-6p)a{l,i)  . 

(This  is  possible  for  S  sufficiently  near  to  one.  The  general  version  of  this 
construction  imposes  a  requirement  that  guarantees  (7)  can  be  satisfied). 
With  probability  p,  players  play  the  strategies  that  yield  w,  and  the  state 
remains  at  (1,1).  With  complementary  probability,  they  play  the  mixed 
strategies  (f.  ,f„).  The  continuation  payoffs  are  exactly  as  at  state  0. 
Thus,  the  payoff  to  player  i  of  choosing  strategy  x(j)  is 

(8)  p[(l-6)w.  +  5a.  (1)]  +  (i-p)  [(l-6)g^s.  (j)  ,  a_.)  +  6a.  (j)]  = 

p[(l-6)w^  +  6a^(l)]  +  (l-p)[g^(E^(l),  a_^)]  =a.(l)  , 

for  all  strategies  s (j )esupp(a. ) .  Once  again,  if  any  long-run  player  chooses 
an  action  not  in  the  support,  play  reverts  to  the  static  Nash  equilibrium. 

Now  we  must  specify  strategies  at  states  other  than  (1,1).  The  state 
(2,2)  occurs  if  the  players  chose  their  most  preferred  strategic  (U,L) . 
Choose  a  point  w'  in  P  and  a  p'e;(0,l)  such  that 


(9)        p"(l-5)w'  +  (l-p')v=  (l-5p')  a(c^,C2) 
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(Once  again  this  is  possible  for  6  near  to  one.)   And  again,  play  depends  on  a 
public  randomization,  switching  to  w'  with  probability  p'  and  otherwise 
following  (a  ,  a.). 

At  state  (1,2),  play  switches  to  a  point  in  P  for  one  period  with 
probability  p  to  normalized  payoffs  out  to  a{l,2).  With  complementary 
probability,  players  once  again  play  the  mixed  strategies  <^   and  the  same 
continuation  payoffs  are  used.   State  (2,1)  is  symmetric. 

Now  let  us  argue  that  the  constructed  strategies  are  an  equilibrium  for 
5  sufficiently  large.   First,  if  there  are  no  deviations,  the  payoffs  starting 
in  state  (j,k)  are  cx,(j,k).   If  player  i  deviates  to  an  action  outside  of  the 
support  of  a.,  or  if  either  player  deviates  when  the  strategies  say  to  play  a 
pure  strategy  point,  the  deviation  is  detected,  and  play  reverts  to  a  static 
equilibrium.   For  6  near  to  one  all  of  the  a's  exceed  to  static  equilibrium 
payoff,  and  so  lor  sufficiently  large  S  no  player  will  choose  such  a 
deviation.  And,  by  construction,  players  are  indifferent  between  the  actions 
in  the  support  of  their  mixed  strategy,  if  they  plan  to  always  conform  in  the 
future.  Then  by  the  principle  of  optimality,  no  arbitrary  sequence  of 
unilateral  deviations  is  profitable. 
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APPENDIX 


In  this  appendix  we  consider  discrete-parameter  martingales  Ix  ,F  I  n  = 

n     n 

0,1, ,  where  {F  \    is  a  filtration  on  an  underlying  probability  space.  We 

assume  that  x,.  =   0. 


Lemma  A.l.  Let  Ix  ,F  1  be  a  martingale  sequence  with  bounded  increments. 
n  n 

(That  is,  for  some  number  B,  Ix  -  x  ,  !  s  B,  almost  surely.)  Then  lim  x  /n  = 

'  n    n-1 '  ^  n 

n-Kc 

0  almost  surely.  A  proof  of  this  lemma  can  be  found  in  Hall  and  Heyde  (1980, 
page  36f f )  . 


We  also  use  the  following  standard  adaptation  of  this  strong  law: 


Lemma  A. 2    For  IK    ,F   as  above,  let  X  =  minx..  Then  limX  /n  =  0  almost 

l<n         n-«» 

surely. 


Proof:   Since  x„  =  0,  X  ^0  for  all  n.  Fix  a  sample  of  the  stochastic 
On 

process.   Since  X  /n  ^  0,  we  only  have  to  show  that  lim  inf  X  /n  =  0. 

Suppose,  instead,  that  n.  is  a  subsequence  along  which  the  limit  is  less  than 

0.  For  each  n.,  there  is  m.  2  n.  with  x   =  X   ,  and  thus  0  >  X  /n.  =  x  /n. 
1  11      m.    n.  n.im.i 

11  11 

^   x^  /m . .  Hence,  along  the  subsequence  Im.},  x  /m .  violates  the  strong  law,  . 
m .   1  1    m .   1 

1  1 

which  can  happen  only  on  a  null  set. 


Lemma  A. 3   Let  Ix  ,F  1  be  a  supermartingale  with  bounded  increments  and  with 

x-  =  0.  Let  !X^1  be  defined  from  Ix  1  as  in  lemma  2.  Then  lim(x  -  X  ) /n  = 
u  n  n  „  n    n 

n-«D 

0  almost  surely. 
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Proof:   Since  x^  X  ,  we  only  need  to  show  that  the  limsup  of  the  sequence 

is  nonpositive.  For  n  =  1,  ...,  let  K   =  x     -  x  ,,  and  let  K   =  K 

n    n-1  n 

E(!?  If  .).  Note  that  K      ^   K    .     Let  y  =  l"  ,  C  ,  and  let  Y  =  inf|y.;i  = 
n'  n-1  n    n       n    i=l  n  n      -^i 

l,...,nl.  Then,  immediately,  ly  ,F  1  is  a  martingale  sequence  with  bounded 

increments,  and  lemmas  1  and  2  tell  us  that  lim  y  /n  =  lim  Y  /n  =  0,  and  thus 

lim(y  -  Y  )/n  =  0.  We  are  done,  therefore,  once  we  show  that  x  -  X  s  y  - 
n    n  n    n    n 

Y  point  wise.   But  this  is  easily  done  by  induction.   It  is  clearly  true  for 

n  =  0  by  convention.  Assume  it  holds  for  n-1;  then  since  K      -    K    , 

n    n 


X  ,-X  ,    +  K      sy  ,-Y  ,+r,or 
n-1    n-1    n    n-1    n-1    n 

X  -  X  ^  ^  y  -  Y  ,  . 
n    n-1   ■'n    n-1 


If  X  =  X  ,,  then  since  Y  ,  ^  Y  ,  we  are  done.  While  if  X  "  "^     .,    then  X 

n    n-1  n-1    n  n    n-1       n 

x,andx  -X  ^y  -Y.  Q.E.D. 

n      n    n   ■'n    n 


A  symmetrical  argument  completes  the  proof,  and  we  obtain; 


Lemma  A. 4  :   Let  Ix  ,F  1  be  a  submartingale  with  bounded  increments  and  x  = 
n  n  ^  n 

0.  Let  X  =  max  Ix. li  =  1, ,nl.  Then  lim    (X  -  x  ) /n  =  0  almost  surely. 

n        1  n-wo  n    n 


31 


FOOTNOTES 


1.  The  required  discount  factor  can  depend  on  the  payoffs  to  be  attained, 


2.  We  thank  Ian  Johnstone  for  pointing  us  to  this  result. 
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